|
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many natural language processing tasks, including information extraction in many languages.〔Languages mentioned on http://gate.ac.uk/gate/plugins/ include Arabic, Bulgarian, Cebuano, Chinese, French, German, Hindi, Italian, Romanian and Russian.〕 GATE has been compared to NLTK, R and RapidMiner.〔("Open Source Text Analytics" web article by Seth Grimes )〕 As well as being widely used in its own right, it forms the basis of the KIM semantic platform.〔("KIM – a semantic platform for information extraction and retrieval", by Popov et al (Natural Language Engineering (2004), 10:375-392) )〕 GATE community and research has been involved in several European research projects including TAO, SEKT, NeOn, Media-Campaign, Musing, Service-Finder, LIRICS and KnowledgeWeb, as well as many other projects. As of May 28, 2011, 881 people are on the gate-users mailing list at SourceForge.net, and 111,932 downloads from SourceForge are recorded since the project moved to SourceForge in 2005.〔(GATE project page on SourceForge )〕 The paper "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications"〔("GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications", by Cunningham H., Maynard D., Bontcheva K. and Tablan V. (In proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002) )〕 has received over 800 citations in the seven years since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide,〔(GATE User Guide )〕 include "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady,〔Konchady, Manu. (Building Search Applications: Lucene, LingPipe, and Gate ). Mustru Publishing. 2008.〕 and "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock.〔("Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock )〕 == Features == GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger. ANNIE can be used as-is to provide basic information extraction functionality, or provide a starting point for more specific tasks. Languages currently handled in GATE include English, Spanish, Chinese, Arabic, Bulgarian, French, German, Hindi, Italian, Cebuano, Romanian, Russian. Plugins are included for machine learning with Weka, RASP, MAXENT, SVM Light, as well as a LIBSVM integration and an in-house perceptron implementation, for managing ontologies like WordNet, for querying search engines like Google or Yahoo, for part of speech tagging with Brill or TreeTagger, and many more. Many external plugins are also available, for handling e.g. tweets.〔(TwitIE - An Open-Source Information Extraction Pipeline for Microblog Text )〕 GATE accepts input in various formats, such as TXT, HTML, XML, Doc, PDF documents, and Java Serial, PostgreSQL, Lucene, Oracle Databases with help of RDBMS storage over JDBC. JAPE transducers are used within GATE to manipulate annotations on text. Documentation is provided in the GATE User Guide.〔(JAPE chapter in the GATE User Guide )〕 A tutorial has also been written by Press Association Images.〔(A JAPE tutorial from Press Association Images, UK )〕 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「General Architecture for Text Engineering」の詳細全文を読む スポンサード リンク
|